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Abstract — In this paper we provide the answer to the following 
question: Given a noisy channel P Y \x and e > 0, how many bits 
can be transmitted with an error of at most e by a single use of 
the channel? 

I. Introduction 

Shannon entropy and information [14] have been shown 
very significant in the scenario of i.i.d. distributions and 
asymptotic rates. Unfortunately, however, these two assump- 
tions fail to be realistic in many real-world scenarios. First 
of all, a given primitive or random experiment is actually 
available only a limited number of times, and an asymptotic 
analysis has, therefore, a limited significance. Second, the 
assumption that a certain primitive is repeated independently 
many times is not always realistic. An important example 
is cryptography, where this assumption leads to a strong 
restriction on the adversary's behavior and possibilities. 

In [6], the assumption of independence has been dropped, 
but the analysis still remains asymptotic. In the present paper, 
we drop both assumptions at once and consider the case 
where a certain information-theoretic primitive, such as a 
communication channel, or random experiment is available 
only once. This single-serving case has also been called 
"single shot" in the literature. 

Let us first consider an example from cryptography or, more 
precisely, information-theoretic key agreement from correlated 
pieces of information. Let two parties, Alice and Bob, as well 
as an adversary, Eve, have access to n independent realizations 
of random variables X, Y, and Z, respectively, with joint 
probability distribution Pxyz- Moreover, authenticated but 
public communication from Alice to Bob (but not in the other 
direction) is possible. Their goal is to generate a common 
secret key of length £(n), i.e., a uniform string about which 
the adversary is virtually ignorant. Asymptotically, for large 
n, the rate at which such a key can be generated is given by 



lim 



max (H(U\ZV) - H(U\YV)) 

YZ^X^UV 



(1) 



(see, for instance, [17], [3], [1], [10]). 

Let us now consider the non-asymptotic case where n = 1, 
i.e., the random experiment defined by Pxyz is only run once. 
How many virtually secret bits can then be extracted? First of 
all, note that Q fails to provide the correct answer in this 
case. To see this, assume, e.g., that X is uniformly distributed 
and that Y = X, whereas Z = X holds with probability 1/2 
(and Z = A otherwise). Then the right-hand side of Q is 



non-zero, but no secret can be extracted at all by Alice and 
Bob since, with probability 1/2, Eve knows everything. We 
conclude that Shannon entropy fails to be the right measure 
in this setting. But what does it have to be replaced by? 

Results on randomness extraction, also known as privacy 
amplification [2], [9], [8], indicate that the right answer might 
be given by so-called min-entropies rather than Shannon 
entropies. Indeed, it is shown in [13] that the so-called con- 
ditional smooth min- and max-entropies [12], [13] i?„ lax and 
#min (for the precise definitions see below) replace Shannon 
entropy in this case; the achievable secret-key length I is 
approximated (up to a term log(l/e), where e is the security 
of the final key) by 

1 ~ YZ™xluV {H ™ iaiUlZV) ~ H ^ U \ YV )) ■ 

It is the goal of this paper to show that smooth min- 
and max-entropy has a similar significance in communication 
theory, i.e., it can be used for the characterization of commu- 
nication tasks in a single-serving setting. Among others, we 
consider the following question: Given a noisy communication 
channel W — Py\x an d £ > 0, what is the maximum number 
Cg 0mm (W) of bits that can be transmitted with error at most 
e by a single use of the channel. Recall that, in the i.i.d. case, 
i.e., if the channel can be used many times independently, an 
asymptotic answer to this question is given by the channel 
capacity C^JSj, which can be expressed by the well-known 
formula [14] 

C%Z(W) = m**(H(X) - H(X\Y)) . 

fx 

As we shall see, the answer for the single-serving case looks 
very similar, but the (conditional) Shannon entropies are 
replaced by smooth min- and max-entropies: 

C £ comm (W) « max(^ in (X) - H^{X\Y)) . 

rx 

II. Notation and Previous Work 
A. Smooth Min- and Max-Entropies 

Let X be a random variable with probability distribution 
Px- The max-entropy of X is defined as the binary logarithm 
of the size of the support of Px, i.e., 

H max (X) = log\{x eX:P x (x)> 0}| . 



Similarly, the min-entropy of X is given by the negative 
logarithm of the maximum probability of Px, i.e., 

H min (X) = -log(max(Pjr(a;)) . 

X 

Note that H min (X) < H(X) < H max (X), i.e., the min- and 
max-entropies are lower and upper bounds for the Shannon 
entropy (and also for any Renyi entropy of order a e [0, oo]), 
respectively. 

For random variables X and Y with joint distribution Pxy, 
the "conditional" versions of these entropic quantities are 
defined as 

H max (X\Y) = maxH max (X\Y = y) , 
v 

H min (X\Y) = min H min (X\Y = y) . 

v 

In [13], max- and min-entropies have been generalized to so- 
called smooth max- and min-entropies. For any e > 0, they 
are defined by optimizing the "non-smooth" quantities over 
all random variables X and Y which are equal to X and Y 
except with probability e, i.e., 



H £ max (X\Y) = 
HLn(X\Y) 



XY:Pr[XY=iXY]<e 



XY:Pr[XY=iXY]<e 



Equivalently, smooth max- and min-entropies can be expressed 
in terms of a optimization over events £ that have probability 
at least 1 — e. Let Pxs\Y=y{%) be the probability that X = x 
and the event £ occurs, conditioned on Y = y. We then have 

H Lsx( x \ Y ) = min maxlog|{x : P X £\y= v {x) > 0} 

£:Pr(£)>l — e y 

H Ln( X \ Y ) = .max mmmm(-\ogP XElY=y (x)). 

These smooth entropies have properties similar to Shannon 
entropy — this is in contrast the the usual, non-smooth min- and 
max-entropies which have many counterintuitive properties 
that make them less useful in many contexts. For example, 
the chain rule H(X\Y) = H(XY) - H(Y) translates to [13] 



H e ^{XY) 



< H^JXY) - H^(Y) + log(l/(e - ei - e 2 )) 



and 



H^ in (XY) - H^{Y) - log(l/(e - e 1 - e 2 )) 
< H e min (X\Y) < H^'(XY) - H^ in (Y). 

B. Operational Interpretation of Smooth Max- and Min- 
Entropies 

In [15] it was shown that the rate at which many indepen- 
dent realizations of X can be compressed is asymptotically 
H(X\Y) if the decoder is provided with side information 
Y. It is easy to see that H(X\Y) also is the rate at which 
uniform randomness can be extracted from X, in such a way 
that it is independent of Y. In [13], it was shown that the 
smooth entropies -ff^ ax and ii^in quantify compression and 
randomness extraction, respectively, in the single-serving case. 



More precisely, let H^ omp (X\Y) be the length of a bit string 
needed to store one instance of X such that X can later be 
recovered with an error of at most e using this string and Y. 
This quantity is then roughly equal to ff^, i.e., 

H^(X\Y) < H s comp (X\Y) 

<H^ x (X\Y)+log(l/(s-e')) . 

Similarly, let H^ xt (X\Y) be the maximum length of a string 
that can be computed from X, such that this string is uniformly 
distributed and independent of Y, with an error of at most e. 
We then have 

^ in (X|F)-21og(l/( £ - e ')) 
< m xt {X\Y) < H^(X\Y). 

C. Common Information 

The common information is the rate at which uniform 
random bits can be extracted both from X n and Y n , which 
come from independent repeated realizations of the random 
experiment Pxy without communicating. It has been shown 
in [5] that the common information is equal to the maximum 
entropy of a common random variable that both players can 
compute. As in [4], [16], we will denote this random variable 
by X AY, i.e., the common information of X and Y is given 
by H(X A Y). 

It is shown in [16] that the common information can be 
used to characterize the zero-error capacity C^l™ mm (W) of a 
channel W as follows: 

Co-To m mm (>V) = lim maxii/(X" Ay") . 

n— >oo P x n n 

Note that the usual (asymptotic) channel capacity C^^ 1 (W) 
of W is given by a similar expression, where the common 
information is replaced by the mutual information, i.e., 



C c a o S JZ(W)=max/(X;y) 



lim max-I(X n ;Y n ) . 

rwoo P xn n 



III. EXTRACTABLE COMMON RANDOMNESS 

We denote by C| xt (X, Y) the maximum amount of uniform 
randomness that can be extracted from X and Y, without any 
communication, with an error of at most e. Asymptotically, it 
follows from [5] that 

Hm n m ! 1 = H(X A Y). 

e^O n^oo n 

In the following, we analyze the quantity C^ xt (X, Y) for 
the single-serving case. First, we will show that C| xt (X, Y) 
is characterized by the following quantity. 

Definition 1: 

C £ min (X; Y) = max H min (X AY) . 

XY:Pt[XY^XY]<e 

Theorem 1: For all random variables X and Y, and for all 

s' and e > e', we have 

Cl xt {X-Y) > C< n (X;Y) -21og(l/( £ _- £ ')) • 
Proof: Let Alice and Bob have X and Y, respectively. 
They both can calculate X AY and extract at least H m i n (X A 
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Y) — 21og(l/(e — e')) bits with an error of at most e — e 1 . 
Since Pr[XY ^ XY] < s', we get at most an additional error 
of e' if they use X and Y instead of X and Y . The total error 
is, therefore, at most e. ■ 
Theorem 2: For all random variables X and Y, and for all 
e, we have 

C s ext (X;Y)<C^ in (X;Y) . 
Proof: Let us assume that Alice and Bob can extract more 
than Cf uin (X; Y) bits with an error at most e. Therefore there 
exist functions / and g such that with probability 1 — e both 
functions output the same uniform random string R of length 
bigger than C7^ ln (X; Y), which means that there exist X,Y 
such that Pr[(X, Y) ^ {X, Y)] < e and f(X) = g{Y) = R. 
As shown in Lemma 1 of [16], this implies that R can be 
computed from X AY, that is, there exists a function h such 
that R = h(X A Y). The function h could thus be used to 
extract more than H m [ n bit from X A Y, which is impossible. 

■ 

In the following, we derive an upper bound on C^ in (X; Y) 
in terms of smooth min- and max-entropies. 

Lemma 1: For all random variables X and Y, and for all 

e, £i, and 62, we have 

C* min (X;Y) < H^ X (X) - H^+ 2 %X\Y) + log(l/ei) . 

Proof: Let X and Y be the random variables that 
maximize C ^ n (X; Y), and let C = X A Y. We have 

H min (C) < H- max (XC) - H%£'(X\C) + log(l/ £l ). 

C is a function of X and of Y with probability at least 1 — e. 
Therefore, we can bound 

and 

H^(X\C)>H^+%X\Y). 

We get 

H min (C) < H^{X) - HH+^(X\Y) + log(l/ £l ). 

The statement follows when e is added to e^- ■ 
No non-trivial lower bound is known so far for C| xt (X, Y). 

However, one can bound ma.xp x C^ in (X;Y). This will turn 

out to be useful for the considerations in the next section. 
Lemma 2: For all conditional distributions Py\x an d f° r a U 

£\, 62, and £3, we have 

> max (H^(X) - H^ X (X\Y)) - \og(l/e 3 ) . 
Proof: Let Px be the distribution that maximizes 
H min(_ x ) - H ™J X \ Y )- Th ere exist random variables X 
and Y with Pr[XY / XY] < e x + e 2 such that 
H min (X) - H max (X\Y) = H s m \ n (X) - H^(X\Y). We 
choose, independently and according to the distribution Px, 
2 H m in(x)-H m ^(x\Y)-io S (i/e 3 ) va \ ues _ Let S be the set of these 

values and let X be a random variable that takes on a value 
in S with equal probability. Since P x (x) ■ 2 ff """W < 1, the 



probability that a value x chosen according to P x is in S is 
at most 

P x (x) ■ 2 i? ™(*)- H »«(x|r)-io g (i/ £ :3) < 2 -Hm*x(x\Y) £3 _ 

Let x and y be chosen according to the distribution 
P x Py\x- The probability that there exists a value x' G 
S such that x' ^ x and Py\x{v^') > is at most 
2H„, ax (x\Y) 2 -H rallx (x\Y) £3 _ £ ^ Therefore, there exists a 
function / such that Yr[X ^ f(Y)] < £3 holds, and we have 

C E J in (X;Y) = H min (X) 

= H^ n (X) - H^(X\Y) - Iog(l/e a ) • 

The statement now follows from the fact that 

C^t £2+E3 (X;Y)>C^ n (X;Y) . 

■ 

IV. Communication 

Let us now come back to the question posed in the abstract. 
We define the e single-serving channel capacity of a channel 
W = Py\x> denoted C^ omm (W), as the maximum number of 
bits (i.e., the logarithm of the number of symbols) that can be 
transmitted in a single use of W, such that every symbol can 
be decoded by an error of at most e. Theorem [3] shows the 
connection between the the extractable common randomness 
and single-serving channel capacity, similar to the connection 
between the common information and the zero-error capacity 
shown in [16]. 

Theorem 3: For all channels >V = Py\x an d f° r s' < e , 
we have 

max C< n (X;Y) -log(e/ {e - e')) 

fx 

<C c E omm(tt>) <maxC7f nin (X;F) . 

rx 

Proof: Let C C X be a code that can be decoded with 
an error of at most e and let X be uniformly distributed over 
C. Then there exists a Y with Pr[Y = Y] > 1 — e, such that 
X = X A Y. It follows that 

maxC7 E lin (X;r)>C7 c e omm (W) . 

rx 

Let Px be a distribution that maximizes maxp x C^ lin (X; Y), 
and let X, Y be random variables for which H(X A Y) = 
C< n (X;Y) holds as well as Pr[XY = XY] > 1 - s'. 
Let C := X A Y. We can write C as a combination of 
uniform random variables Cj, with H m i n (Ci) = H m i n (C). 
More precisely, we have Pc = 2~2i^iPCi> where Pq^x) G 
{0, 2~ Hn,in ( c ' > } for all x. The support of the random variable 
Ci which minimizes the error probability defines a code 
Ci C X that can be decoded with an error of at most e, if 
the input is uniformly distributed. Since we need a code that 
works for any input distribution, we delete all symbols which 
get decoded with an error bigger than e > e'. From the Markov 
inequality follows that the reduced code still contains at least 
e^2 ffmin(c) symbols. It follows that 

C c £ omm (W) > maxC< n (X; Y) - log( £ /(e - e')) . 

rx 



3 



From Lemma ^ we have 

maxC^ in (X;r) 

Px 

< max {H%JX) - H^ +2e (X\Y)) + log -. 

fx S\ 

From the same argument as in the proof of Theorem [3] fol- 
lows that maxp x C^ in (X;Y) is maximized by a distribution 
where all x with positive probability have equal probabilities. 
Therefore, we have if^ ax (A") = H^ in {X) and get 

max (#< n P0 - H^JX\Y)) - log ■ 



£ — £' — £" 



< maxC. £ ; 
" Px 



< max (HX(X) - H^+^(X\Y)) + log - . 

Together with Theorem|5] this implies the following bound 
on the single-serving channel capacity C^ omm (W). 

Theorem 4: For all channels W = Py\x an d a U £ ' > e '' '■> 
e > e' + e", £\, and e^, we have 



max 

Px 



(Hi in {X) - H(; ax (X\Y)) - log 



4t 



(£-£'- £'■ 



< C £ (W) 



< max (H^(X) - H^ +2s (X\Y)) + log 1 . 

Of £l 

V. Conclusions 

Shannon entropy can be used to characterize a variety 
of information-processing tasks such as communication over 
noisy channels in the scenario where the primitive can be 
used independently many times. We have shown that smooth 
min- and max-entropies play a similar role in the more 
general single-serving case. In particular, we have given an 
explicit expression for the "single-serving channel capacity." 
We suggest as an open problem to find other such examples 
and contexts. 

The notion of conditional smooth entropies has recently 
been generalized to quantum information theory [11] (see 
also [7] for the non-conditional case). It is likely (but still 
unproven) that, similarly to our classical Theorem [4] these 
quantities can be used to characterize single-serving capacities 
of quantum channels. 
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